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BEYOND THE RUNS THEOREM 


JOHANNES FISCHER, STEPAN HOLUB, TOMOHIRO I, 
AND MOSHE LEWENSTEIN 


Abstract. In [2], a short and elegant proof was presented showing that 
a binary word of length n contains at most n — 3 runs. Here we show, 
using the same technique and a computer search, that the number of 
runs in a binary word of length n is at most ||n < 0.957 n. 


1. Introduction 

The research on the possible (maximal) number of runs in a word of 
length n dates back at least to J3]. Since then, there where two types of 
efforts: finding words rich of runs, and proving an upper bound on the 
number of runs. Both efforts were accompanied by a heavy use of computer 
search. An (at least psychologically) important barrier was the question 
whether the number of runs can be larger than the length of the word, and 
the negative answer was known as “the runs conjecture”. The barrier was 
broken, turning the conjecture into a theorem, by a remarkably simple and 
computer-free proof in [2f. In this paper we continue the narrowing of the 
gap between the two bounds. We build essentially on the technique leading 
to the beautiful proof of the Runs Theorem, adding again some computer 
backing. 

For the more detailed description of the history of the problem and for 
an extensive list of literature, see for example M- 

2. Runs and Lyndon roots 

For any word u, an integer p with 1 < p < |tt| is said to be a period of u 
if u[i\ = u[i + p\ for all 1 < i < |u| — p. Especially, the smallest period of u 
is called the period of u. A prefix v of u that is also a suffix of u is said to 
be a border of u. The empty word and u are trivial borders of u. We call u 
unbordered if there is no border other than trivial ones. 

Given a word w, we say that an interval [i-.j] with 1 < i < j < |w;| is 
period-maximal in w if w[i..j] has no extension in w with the same period. 
That is, if 1 < i! < i < j < j' < |u>| is such that w[i..j] and w[i'..j'] have the 
same period, then i = i! and j = j'. A period-maximal interval is said to 
be left-open if i = 1, otherwise it is left-closed. Similarly, a period-maximal 
interval is right-open or left-closed depending on whether or not j = |ru|. 
If 1 < i and j < |u>|, the interval is said to be closed. A period-maximal 
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interval is a run if its length is at least double of the period p of w[i..j], that 
is j — i + 1 > 2p. 

We shall work with the two-letter alphabet {0,1}, which allows two lexi¬ 
cographic orders: -<o is defined by 0 -<o L and -<i by 1 -<i 0. We shall write 
a = 1 — a. A word v is said to be a Lyndon word with respect to some order 
-< if and only if w < u for any nonempty proper suffix u of w. In particular, 
Lyndon words are unbordered. We say that a Lyndon word v is a Lyndon 
root of w if v is a factor of w and |u| is the period of w. 

A right-closed period-maximal interval [i-.j] of w is said to be a-broken in 
w, where a = w[j + 1]. We will also say, a bit imprecisely, that the period 
of w\i..j\ is broken by a. 

Let p(n, 2) denote the maximal number of runs in a binary word of length 

n. 

The basic idea of [2] is to associate an a-broken run r = [i..j ] with the set 
A(r) of intervals corresponding to the Lyndon root of w with respect to the 
order a -< a, excluding from A(r), if necessary, the interval starting at the 
beginning of r. This definition has to be completed to cover also runs that 
are not broken, that is, right-open runs. For those runs, the set A(r) can 
be defined as consisting of Lyndon roots with respect to both orders. In [2], 
the case of unbroken runs is solved by appending a special symbol $ to the 
end of w, which is equivalent to arbitrarily choosing one of the orders (the 
order 0 -< 1 in their case). 

Let Beg(5) denote the set of starting positions of intervals in the set 
S, and let B(r) = Beg(A(r)) for any run r. The crucial fact, implying 
instantaneously that there are at most |rc| — 1 runs, is that B{r) and B{r') 
are disjoint for r r'. At no cost, it is possible to make this basic tool a bit 
stronger. For sake of clarity, let us first give a formal definition. 

Definition 1. Let w be a binary word. Let s = [ i..j] be a period-maximal 
interval in w with period p. Then A(s) denotes the set of all intervals [i'■■j'} 
of length p such that i < i' < j' < j and w[i'..j'] is a Lyndon word with 
respect to an order -< satisfying the following condition: if j < |?n| and 
[i..j] is a-broken in w, then a -< a (the condition being empty if [i--j\ is not 
broken). Also, let B(s) = Beg(A(s)). 

Example 1. Take a word w = 1110101101 of length 10. For a period-maximal 
interval si = [1..3] with period 1, A(si) = {[2..2], [3..3]}. For a period- 
maximal interval s 2 = [3..7] with period 2, A(s 2 ) = {[5.. 6 ]}. For a period- 
maximal interval S 3 = [5..10] with period 3, A(ss) = {[ 6 .. 8 ], [7..9]}. Note 
that u>[ 6 .. 8 ] = Oil and u;[7..9] = 110 are Lyndon words w.r.t. -<o and 
-< 1 , respectively. For a period-maximal interval S 4 = [2.. 10] with period 5, 
A(s 4 ) = {[4.. 8 ]}. 

The following lemma is now stronger than the corresponding [2] Lemma 8 ] 
in two ways. First, it applies also to period-maximal intervals that are 
not runs, and second, as noted above, A(r) is defined more generously for 
unbroken runs. The proof, however, is the same. 

Lemma 1. Let s and t be two distinct period-maximal intervals in w. Then 
B(s) and B(t ) are disjoint. 
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Proof. Let s = [i s .-j s ] and t = \it-jt\- Suppose that k £ B(s)C\B(t), and let 
[k..m s ] £ A(s) and [k..mt] £ A (t). If m s = mt, then s and t have the same 
period and s = t. We can therefore, w.l.o.g., suppose that m s < mt- Then 
w[k..j s ] has a smaller period than the unbordered w[k..mt], which implies 
that j s < mt- Therefore s is a-broken with a = w[j s + 1]. 

Since a breaks the period of w[k..j s ], we have u;[m s + l..j s + l] -< a w[k..j s -\- 
1]. Since both w[m s + 1 ,.j s + 1] and w[k..j s + 1] are factors of w[k..jt], we 
deduce that w[k..m t \ is Lyndon w.r.t. Note that w[k..j t \ contains both 
letters. Therefore w[k] = a, and the -*< a -minirnality of w[k..m s ] implies that 
w[i s ..j s \ £ a + . The definition of A(s) yields i s < k and it < k, which leads 
to a contradiction with -^-minimality of w[k..mt]. □ 

Example 2. It is worth noting that the appearance of a + in the previous 
proof is significant, and it is the place where we use the prohibition of the 
very first position of a run. Without this condition, Lemma [T] would not 
hold. Consider the word 1101011 and position 2, which is the starting point 
of the Lyndon root 1 of the run 11 and the starting point of the Lyndon 
root 10 of 10101 , the latter being excluded by the prohibition. 

Lemma [T] implies that for each position k there is at most one period- 
maximal interval s such that k £ B(s). Such an s can be found using the 
following rules. 

Lemma 2. Let k > 1 be a position of w such that w[k\ = a and w[k — 
l..|w;|] 7 ^ aa + . Then k € B(s) where s = [i..j\ is the period-maximal exten¬ 
sion of 

• [k..k\, if w[k\ = w[k — 1 ]; 

• \k..k'], where w[k..k r ] is the longest Lyndon word with respect to -< a 
starting at the position k, otherwise. 

Proof. If w[k] = w[k — 1], then s is a run with period one containing the po¬ 
sition k with i < k. Hence, k £ B(s) immediately follows from Definition [TJ 

Let w[k — 1] = o, and let w[k..k'] be the longest Lyndon word with respect 
to -< a starting at the position k. From ui[fc..|tc|] / a + , it is easy to see that 
k! ^ k and w[k'} = a, which implies i < k. If s is right-open, we are done: 
k £ B(s ) since the condition on -< is empty (see Definition |T]) • It remains 
to show that s is a-broken if it is broken. Assume to the contrary that s is 
a-broken. We show that w[k..j + 1] is a Lyndon word with respect to -< a - 
Let p denote the length of the Lyndon word w[k..k'], that is, p = k' — k + 1. 
Let first k < h < k! . Since w[k..k'] is a Lyndon word with respect to -< a , we 
have w[k..k'] -< a w[h..k'], and thus also w[k..j + 1] -< a w[h..j + 1], Let now 
k! < h < j + 1 . As above, w[k..k'] -< w[h —p + l..h] -< a w[h — p + l..j + 1]. 
Also w[h — p + l..j + 1] -< a w[h..j + 1], since w[h..j + 1] = w[h..j]a and 
w[h..j]a is a prefix of w[h — p + l..j + 1]. Therefore, w[k..j + 1] is a Lyndon 
word, which contradicts that w[k..k'] is the longest Lyndon word starting at 
the position k. □ 

Note that for the position k with w[k — l..|ui|] = aa + , there is no period- 
maximal interval s with k £ B(s). An algorithm computing for all positions 
the longest Lyndon words starting there is discussed in |2[ Section 4.1]. 
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3. Idle positions 


In order to make explicit the relation between runs and positions, we 
associate with a run r the position maxB(r) and say that such a position 
is charged (by r ). We repeat that the Runs Theorem was proved in [2] by 
pointing out that charging is an injective mapping, which is a corollary of 
Lemma|T| This also yields an obvious strategy for further lowering the upper 
bound on the number of runs. One has to find positions that are not charged 
in an arbitrary word. We shall call such positions idle. Equivalently, we want 
to identify a position i satisfying either of the following two conditions. 

(1) i is not contained in B(r ) for any run, or 

(2) i is in B(r) \ {maxB(r)} for some run r. 


3.1. Idle positions that are resistant to extensions. In order to be 
able to estimate the number of idle positions locally, we are interested in 
idle positions that remain idle in any extension of w. One obvious fact is 
that closed period-maximal intervals are not affected by extensions. For 
example, the third position in the word 1010011 remains idle for any ex¬ 
tensions. That is because the period three of 1001 is broken by 1, and the 
period-maximal extension of 1001 is s = [2..6] that is closed, but s is not a 
run, and Definition [T] and Lemma [T] yield that the position is idle. 

Also, it is easy to see that runs r with |L>(r)| > 1 that are right-closed 
preserve this property in any extension. However, we have to be careful with 
right-open runs since some positions in B(r) may disappear when the run r 
gets broken by a right-extension. To clarify this case, let A a (r) denote the 
set of Lyndon roots in A(r) that are Lyndon words with respect to -< a , and 
let B a (r ) = Beg(A a (r)). Note that B a (r ) = Ba(r) if and only if r is a run 
with period one. Now we consider the set D(w ) of idle positions k in a word 
w falling into one of the following cases: 

(a) k £ B(s), where s is a closed period-maximal interval that is not a run. 

(b) k £ ( B a (r ) \ {max L> a (r)}), where r is an a-broken run. 

(c) k £ ( B a {r ) \ {maxB n (r)}), where r is a right-open run and a is chosen 
such that minR a (r) > minRa(r) (a £ {0,1} is arbitrary if its period is 
!)• 

By D(w) we intend to say that, for any k £ D(w), the position |u| + k in 
uwv is idle for any extensions u and v. The only exception is the case (c) in 
which the position |u| + k may not be idle if r is a-broken in the extension. 
But even in this case we have that at least one of the positions |u| + k and 
|w| + k — g of uwv is idle, where g = minB a (r) — min.Ba(r). Therefore 
the number of idle positions does not decrease for any extensions. This is 
formulated in the following claim. 


Claim 1. Let w, u and v be arbitrary binary words. Then 


D(uwv ) fl [\u\ + 2.. |uw 


1 ] 


> |DH|. 


Proof. We examine k £ D(w ) of each case: 

• For Case (a). Since s = [i..j] is closed, we have a closed period- 
maximal interval s' = [|w| + i..\u\ + j] in uwv. Since s' is not a run, 
|u| + k is in D(uwv). 
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• For Case (b). Since r = [i..j] is an a-broken run, we have an a-broken 
run r' = [i'..|?x| + j] with i' < |a| + i in uwv. Since any Lyndon root 
in A(r) appears in A(r') (with shift |u|), |u| + k is in D(uwv). 

• For Case (c). Let r = [z..|uj|] be a right-open run in w. We have a 
run r' = [i'-.j 1 ] with i' < \u\ + i and \uw\ < j' in uwv. Note that 
k — g E Ba(r), where g = min B a (r) — min Ba(r). 

— If r' is still open or a-broken, any Lyndon root in A a (r) appears 
in A a (r / ) (with shift |u|), and hence, |rt| + k is in D{uwv). 

— If r' is a-broken, any Lyndon root in A a (r) appears in A a( r ') 
(with shift |n|), and hence, |n| + k — g is in D(uwv). 

We have described an injective map from D(w) to D(uwv) D [|w| + l..|uw;|]. 
The map always assigns, for some a and some r, a position k in [l..\w\]nB a (r) 
to the position k + |a|. Taking into account that 1 and |w;| are not in any 
B a (r), we get the claim. □ 

This yields the following lemma: 


Lemma 3. If \D(w)\ > d for any binary word w of length m, then 


lim 

n—>oo 



m — 2 — d 

< - 

m — 2 


Proof. Let y = ay\y 2 • • • be an infinite binary word, where a is a letter, and 
yi = m — 2 for each i. By Claim [fl each interval corresponding to a factor 
yi in y contains at least d idle positions. The claim follows. □ 


3.2. Idle positions that are resistant to left extensions. We further 
identify positions that remain idle when we consider “only” left extensions, 
which only comes into play in Section [5] to estimate the number of idle 
positions in a suffix of a word. Formally, for any word w we define the set 
D'{w) of idle positions k in w falling into one of the following cases: 

(A) k E max5(s), where s is a left-closed period-maximal interval that is 
not a run. 

(B) k E ( B(s) \ {maxF(s)}), where s is a period-maximal interval (which 
is possibly a run). 

(C) w[k — l..|tr|] = aa + holds. 

Note that D{w ) C D'(w ). Since we do not consider right-extensions, we 
can show the following claim, which is a bit stronger than Claim [T] for D. 

Claim 2. Let w and u be arbitrary binary words. For any k E D'(w ), 
|rt| + k E D'(uw). 

Proof. We examine k E D'(w) of each case: 

• For Case (A). Since s = [i..j] is left-closed, we have a left-closed 
period-maximal interval s' = [|u| +i..|a| + j] in uw. Since s' is not 
a run, |u| + k is in D'(uw). 

• For Case (B). Let s = We have a period-maximal interval 

s' = [i / ..11 + j\ with i' < |u| + i in uw. Since any Lyndon root in 
A(s) appears in A(s') (with shift |u|), |u| + k is in D'(uw). 

• For Case (C). Since aa + stays in a suffix of uw, |u| + k in D'(uw). 

□ 
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Algorithm 1: Computing m^. 

Input: A positive integer d. 

Output: Return md- 

1 m <— 0; // Let m be a global variable. 

2 Extend(O); 

3 return m; 

procedure Extend(u>); 

1 if |tr| > m then m <— |u:|; 

2 compute D(w ); 

3 if \D(w)\ > d then return; 

4 foreach a S {0,1} do 

5 Extend (wa ~); 


Considering that 1 ^ D'(w), we get: 


Claim 3. Let w and u be arbitrary binary words. Then 


D’(uw) fl [\u\ + 2..|tru;|] 


> \D\w )|. 


4. Computer search 

Given a positive integer d, Algorithm [T] computes the minimum integer 
nid such that \D(w)\ > d for any binary word w of length . The algorithm 
traverses words by appending characters to the right. If \D(w)\ > d, we stop 
the extension since \D(wv)\ > d for any word v. 

If we already know the value md' for some d' < d, then the following 
pruning of the search space can be employed: If | D(w) n [l..m — md' +1]| > 
d-d', then we stop the extension. That is because for any word z of length 
md', D(z) contains at least d! positions (and 1 ^ D{z)), and hence, for 
any word v of length m — |rc|, D(wv ) contains at least d — d' positions in 
[1 ..m — md 1 * * 4 5 +1] and at least d' positions in [m — m,d> + 2..to]. Namely, D(wv) 
contains at least d — d! + d! = d positions, and hence, any right extension of 
the current w cannot lead to an update of m. 

By computing md and using Lemma [3] we obtained upper bounds for 
lim n _ > . 00 (/ 9 (n, 2)/n) given in Table[0 


5. Upper bound for finite words 

We now prove that we can omit the limit in the bounds in Tabic [T] That 
is, we verify that, for any d < 20 , p(n,2)/n < (md — 2 — d)/(md — 2 ) does 
hold for any n. 

Let y be a finite word and let pi, p 2 , ■.., pi be the list of idle positions 
of y. Note that p\ = 1. For a given d we define 

Sk = s k (y,d ) := \p(k-i)d+i~Pkd+i - 1] for k = 1,2,..., \£/d] - 1, 

Sk = s k (y,d ) := \p( k -i)d+i~\y\\ for k = \\£/d\ . 

In other words, we make a disjoint decomposition of the interval [l..|y|] into 
subintervals Sk such that each Sk starts with an idle position of y, and each 
Sk, except maybe the last one, contains exactly d idle positions. 
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Table 1. Upper bounds of lim n ^ 00 (p(n, 2)/ra). 


d 

m d 

lim^oo(p(n, 2 )/ra) 

m d - m d -1 

1 

63 

0.98360655737. .. 

63 

2 

96 

0.97872340425. .. 

33 

3 

126 

0.97580645161. .. 

30 

4 

150 

0.97297297297. .. 

24 

5 

172 

0.97058823529. .. 

22 

6 

194 

0.96875 

22 

7 

216 

0.96728971962. .. 

22 

8 

237 

0.96595744680. .. 

21 

9 

258 

0.96484375 

21 

10 

274 

0.96323529411. .. 

16 

11 

295 

0.96245733788. .. 

21 

12 

314 

0.96153846153. .. 

19 

13 

332 

0.96060606060. .. 

18 

14 

351 

0.95988538681. .. 

19 

15 

369 

0.95912806539. .. 

18 

16 

388 

0.95854922279. .. 

19 

17 

407 

0.95802469135. .. 

19 

18 

425 

0.95744680851. .. 

18 

19 

444 

0.95701357466. .. 

19 

20 

462 

0.95652173913. .. 

18 


We first claim that all intervals Sk, k < \£/d \, have length at most m d — 2. 
Suppose that the length of some sk = ['<•■.?] is at least m^ — 1 and consider 
the word y[i..j +1] of length m d . By the definition of m d and by Claim [H the 
cardinality of D(y) D [* + l..j] is at least d which means that \i-j] contains 
at least d+1 idle positions, a contradiction. 

It remains to count idle positions in the tail of the word y, that is, in 
the interval s™/<n. By an argument similar to the one above, one can see 
that the length of the interval is at most m d — 1. Let z denote the suffix in 
question, that is, z = y\p(\e/d]-i)d+i--\y\)- Since we only have to consider left- 
extensions of z, we now use D'(z ) to estimate the number of idle positions. 
Since 1 ^ D'(z ) and the first position of is idle in y, our goal is to 

show 

\z\- | D'(z)\ - 1 m d - 2 - d 

(*) - M-<- 7r~ • 

\z\ m d — 2 

Let d = 20. Then the right hand of ( 0 ) is 22/23. We first note that 
(x — l)/x < 22/23 for each x < 23. Therefore, we can assume \z\ > 23. 

A simple computer search verified that \D'(w)\ > 3 for any word w with 
|to| > 13, which means there are at least 3 idle positions in the last 12 
positions of w that are resistant to left extensions. Let now z = Z 1 Z 2 with 
\z 2 \ = 12. If mi — 1 < \zi\ < rni+i — 1 (where mo := 0), then z has at least 
i idle positions in [2..|zi|] by ClaimlU and hence, \D\z)\ > i + 3. Using the 
results in Table [Q a direct calculation yields that, for each i = 0,1,..., 19, 
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if m,i — 1 < \zi\ < rrii + i — 1, then 

\z\ - \D'(z)\ - 1 (m i+ 1 — 2 + 12) — * — 3 — 1 22 

\z\ ~~ 1711 +1 — 2 + 12 ^ 23 

Therefore we get the following result. 

22 _ 

p(n, 2)/n < — = 0.9565217391304347826086 . 

ZiO 

6. Conclusion 

Search for words with high number of runs in the literature yields words 
with approximately 0.944n runs, where n = |iy|, see ISEUH. Therefore, 
the optimal multiplicative constant is somewhere between 0.944 and 0.957. 
The lower bound corresponds to words where on average about every 18th 
position is idle. This seems to fit very well with the eventual distances 
between and nig in Tabled] It is therefore reasonable to expect that 

the optimal density of runs is close to the lower bound, maybe around 1 — 
1/18.5 ss 0.946. 
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